chore: fix flaky langchain tests by hassiebp · Pull Request #1584 · langfuse/langfuse-python

hassiebp · 2026-03-27T14:03:32Z

Disclaimer: Experimental PR review

Greptile Summary

This PR updates assertion counts in tests/test_langchain.py to reflect an increase in the number of observations/generations produced per LangChain run — likely the result of a dependency version bump (LangChain, LangGraph, or OpenAI SDK) that causes the CallbackHandler to capture additional span or generation events.\n\nKey changes:\n- Six assert len(...) calls are raised across test_callback_generated_from_trace_chat, test_openai_instruct_usage, test_link_langfuse_prompts_invoke, test_link_langfuse_prompts_stream, test_link_langfuse_prompts_batch, and test_multimodal.\n- The new counts are: observations 2→3 (simple chat/multimodal), 3→4 (instruct batch), generations 2→4 (single-run prompt tests), and 6→10 (batch-of-3 prompt test).\n- Three imports (StrOutputParser, Runnable, OpenAI) remain inside the test_openai_instruct_usage function body instead of being moved to the top of the module.\n- The newly counted observations/generations in most tests are not verified for content, type, or prompt linkage, leaving it unclear whether they represent expected new spans or unintended artefacts.

Confidence Score: 5/5

Test-only change; safe to merge — all remaining findings are non-blocking style/documentation suggestions.

All findings are P2: one style violation (inline imports), one test-completeness gap (unverified extra observations), and one documentation gap (unexplained batch count). None block correctness or production behaviour.

tests/test_langchain.py — inline imports at lines 254-256 and unverified extra observations across several test functions.

Important Files Changed

Filename	Overview
tests/test_langchain.py	Updates six observation/generation count assertions to reflect new behavior producing more spans per LangChain run; contains inline imports inside a test function and leaves newly-added observations unverified.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[LangChain chain.invoke / stream / batch] --> B[CallbackHandler captures events]
    B --> C{Observation Type?}
    C -->|GENERATION| D[LLM call observation e.g. ChatOpenAI, OpenAI]
    C -->|SPAN| E[Wrapping / chain span]
    D --> F[Langfuse trace]
    E --> F
    F --> G[API: trace.observations]
    G --> H{Count assertions}
    H -->|invoke / stream| I[expect 4 generations, was 2]
    H -->|batch x3| J[expect 10 generations, was 6]
    H -->|simple chat| K[expect 3 observations, was 2]
    H -->|instruct batch x2| L[expect 4 observations, was 3]

_{Reviews (1): Last reviewed commit: "push" | Re-trigger Greptile}

Greptile also left 1 inline comment on this PR.

Context used:

Rule used - Move imports to the top of the module instead of p... (source)

Learnt From
langfuse/langfuse-python#1387

github-actions · 2026-03-27T14:03:42Z

@claude review

tests/test_langchain.py

push

c8424dc

hassiebp changed the title ~~push~~ chore: fix flaky langchain tests Mar 27, 2026

greptile-apps bot reviewed Mar 27, 2026

View reviewed changes

tests/test_langchain.py Show resolved Hide resolved

claude bot reviewed Mar 27, 2026

View reviewed changes

tests/test_langchain.py Show resolved Hide resolved

push

ba3efbf

hassiebp enabled auto-merge (squash) March 27, 2026 14:28

hassiebp merged commit 840cf2a into main Mar 27, 2026
13 checks passed

hassiebp deleted the fix-ci branch March 27, 2026 15:15

claude bot mentioned this pull request Mar 28, 2026

chore(deps-dev): bump langchain-core from 1.2.11 to 1.2.22 #1586

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: fix flaky langchain tests#1584

chore: fix flaky langchain tests#1584
hassiebp merged 2 commits intomainfrom
fix-ci

hassiebp commented Mar 27, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hassiebp commented Mar 27, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Disclaimer: Experimental PR review

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hassiebp commented Mar 27, 2026 •

edited by greptile-apps bot

Loading